Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
1.
Bioinformatics ; 40(3)2024 Mar 04.
Artigo em Inglês | MEDLINE | ID: mdl-38444086

RESUMO

MOTIVATION: KaMRaT is designed for processing large k-mer count tables derived from multi-sample, RNA-seq data. Its primary objective is to identify condition-specific or differentially expressed sequences, regardless of gene or transcript annotation. RESULTS: KaMRaT is implemented in C++. Major functions include scoring k-mers based on count statistics, merging overlapping k-mers into contigs and selecting k-mers based on their occurrence across specific samples. AVAILABILITY AND IMPLEMENTATION: Source code and documentation are available via https://github.com/Transipedia/KaMRaT.


Assuntos
Algoritmos , Software , Análise de Sequência de DNA/métodos , RNA-Seq , Documentação
2.
NAR Genom Bioinform ; 5(4): lqad104, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38058589

RESUMO

The functions of eukaryotic chromosomes and their spatial architecture in the nucleus are reciprocally dependent. Hi-C experiments are routinely used to study chromosome 3D organization by probing chromatin interactions. Standard representation of the data has relied on contact maps that show the frequency of interactions between parts of the genome. In parallel, it has become easier to build 3D models of the entire genome based on the same Hi-C data, and thus benefit from the methodology and visualization tools developed for structural biology. 3D modeling of entire genomes leverages the understanding of their spatial organization. However, this opportunity for original and insightful modeling is underexploited. In this paper, we show how seeing the spatial organization of chromosomes can bring new perspectives to omics data integration. We assembled state-of-the-art tools into a workflow that goes from Hi-C raw data to fully annotated 3D models and we re-analysed public omics datasets available for three fungal species. Besides the well-described properties of the spatial organization of their chromosomes (Rabl conformation, hypercoiling and chromosome territories), our results highlighted (i) in Saccharomyces cerevisiae, the backbones of the cohesin anchor regions, which were aligned all along the chromosomes, (ii) in Schizosaccharomyces pombe, the oscillations of the coiling of chromosome arms throughout the cell cycle and (iii) in Neurospora crassa, the massive relocalization of histone marks in mutants of heterochromatin regulators. 3D modeling of the chromosomes brings new opportunities for visual integration of omics data. This holistic perspective supports intuition and lays the foundation for building new concepts.

3.
Mol Biol Evol ; 40(1)2023 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-36508357

RESUMO

Interspecies RNA-Seq datasets are increasingly common, and have the potential to answer new questions about the evolution of gene expression. Single-species differential expression analysis is now a well-studied problem that benefits from sound statistical methods. Extensive reviews on biological or synthetic datasets have provided the community with a clear picture on the relative performances of the available methods in various settings. However, synthetic dataset simulation tools are still missing in the interspecies gene expression context. In this work, we develop and implement a new simulation framework. This tool builds on both the RNA-Seq and the phylogenetic comparative methods literatures to generate realistic count datasets, while taking into account the phylogenetic relationships between the samples. We illustrate the usefulness of this new framework through a targeted simulation study, that reproduces the features of a recently published dataset, containing gene expression data in adult eye tissue across blind and sighted freshwater crayfish species. Using our simulated datasets, we perform a fair comparison of several approaches used for differential expression analysis. This benchmark reveals some of the strengths and weaknesses of both the classical and phylogenetic approaches for interspecies differential expression analysis, and allows for a reanalysis of the crayfish dataset. The tool has been integrated in the R package compcodeR, freely available on Bioconductor.


Assuntos
Perfilação da Expressão Gênica , Software , RNA-Seq , Filogenia , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos
4.
PeerJ ; 10: e14204, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36353604

RESUMO

Background: Protein-protein interactions (PPIs) are essential to almost every process in a cell. Analysis of PPI networks gives insights into the functional relationships among proteins and may reveal important hub proteins and sub-networks corresponding to functional modules. Several good tools have been developed for PPI network analysis but they have certain limitations. Most tools are suited for studying PPI in only a small number of model species, and do not allow second-order networks to be built, or offer relevant functions for their analysis. To overcome these limitations, we have developed APPINetwork (Analysis of Protein-protein Interaction Networks). The aim was to produce a generic and user-friendly package for building and analyzing a PPI network involving proteins of interest from any species as long they are stored in a database. Methods: APPINetwork is an open-source R package. It can be downloaded and installed on the collaborative development platform GitLab (https://forgemia.inra.fr/GNet/appinetwork). A graphical user interface facilitates its use. Graphical windows, buttons, and scroll bars allow the user to select or enter an organism name, choose data files and network parameters or methods dedicated to network analysis. All functions are implemented in R, except for the script identifying all proteins involved in the same biological process (developed in C) and the scripts formatting the BioGRID data file and generating the IDs correspondence file (implemented in Python 3). PPI information comes from private resources or different public databases (such as IntAct, BioGRID, and iRefIndex). The package can be deployed on Linux and macOS operating systems (OS). Deployment on Windows is possible but it requires the prior installation of Rtools and Python 3. Results: APPINetwork allows the user to build a PPI network from selected public databases and add their own PPI data. In this network, the proteins have unique identifiers resulting from the standardization of the different identifiers specific to each database. In addition to the construction of the first-order network, APPINetwork offers the possibility of building a second-order network centered on the proteins of interest (proteins known for their role in the biological process studied or subunits of a complex protein) and provides the number and type of experiments that have highlighted each PPI, as well as references to articles containing experimental evidence. Conclusion: More than a tool for PPI network building, APPINetwork enables the analysis of the resultant network, by searching either for the community of proteins involved in the same biological process or for the assembly intermediates of a protein complex. Results of these analyses are provided in easily exportable files. Examples files and a user manual describing each step of the process come with the package.


Assuntos
Mapeamento de Interação de Proteínas , Mapas de Interação de Proteínas , Mapeamento de Interação de Proteínas/métodos , Bases de Dados de Proteínas , Software , Proteínas/metabolismo
5.
NAR Cancer ; 4(1): zcac001, 2022 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-35118386

RESUMO

The identity of cancer cells is defined by the interplay between genetic, epigenetic transcriptional and post-transcriptional variation. A lot of this variation is present in RNA-seq data and can be captured at once using reference-free, k-mer analysis. An important issue with k-mer analysis, however, is the difficulty of distinguishing signal from noise. Here, we use two independent lung adenocarcinoma datasets to identify all reproducible events at the k-mer level, in a tumor versus normal setting. We find reproducible events in many different locations (introns, intergenic, repeats) and forms (spliced, polyadenylated, chimeric etc.). We systematically analyze events that are ignored in conventional transcriptomics and assess their value as biomarkers and for tumor classification, survival prediction, neoantigen prediction and correlation with the immune microenvironment. We find that unannotated lincRNAs, novel splice variants, endogenous HERV, Line1 and Alu repeats and bacterial RNAs each contribute to different, important aspects of tumor identity. We argue that differential RNA-seq analysis of tumor/normal sample collections would benefit from this type k-mer analysis to cast a wider net on important cancer-related events. The code is available at https://github.com/Transipedia/dekupl-lung-cancer-inter-cohort.

6.
BMC Res Notes ; 15(1): 67, 2022 Feb 19.
Artigo em Inglês | MEDLINE | ID: mdl-35183229

RESUMO

OBJECTIVES: Transcriptional regulatory modules are usually modelled via a network, in which nodes correspond to genes and edges correspond to regulatory associations between them. In the model yeast Saccharomyces cerevisiae, the topological properties of such a network are well-described (distribution of degrees, hierarchical levels, organization in network motifs, etc.). To go further on this, our aim was to search for additional information resulting from the new combination of classical representations of transcriptional regulatory networks with more realistic models of the spatial organization of S. cerevisiae genome in the nucleus. RESULTS: Taking advantage of independent studies with high-quality datasets, i.e. lists of target genes for specific transcription factors and chromosome positions in a three dimensional space representing the nucleus, particular spatial co-localizations of genes that shared common regulatory mechanisms were searched. All transcriptional modules of S. cerevisiae, as described in the latest release of the YEASTRACT database were analyzed and significant biases toward co-localization for a few sets of target genes were observed. To help other researchers to reproduce such analysis with any list of genes of their interest, an interactive web tool called 3D-Scere ( https://3d-scere.ijm.fr/ ) is provided.


Assuntos
Proteínas de Saccharomyces cerevisiae , Saccharomyces cerevisiae , Regulação Fúngica da Expressão Gênica , Redes Reguladoras de Genes , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo
7.
Metallomics ; 13(8)2021 08 19.
Artigo em Inglês | MEDLINE | ID: mdl-34320190

RESUMO

Plants have developed a diversity of strategies to take up and store essential metals in order to colonize various types of soils including mineralized soils. Yet, our knowledge of the capacity of plant species to accumulate metals is still fragmentary across the plant kingdom. In this study, we have used the X-ray fluorescence technology to analyze metal concentration in a wide diversity of species of the Neotropical flora that was not extensively investigated so far. In total, we screened more than 11 000 specimens representing about 5000 species from herbaria in Paris and Cuba. Our study provides a large overview of the accumulation of metals such as manganese, zinc, and nickel in the Neotropical flora. We report 30 new nickel hyperaccumulating species from Cuba, including the first records in the families Connaraceae, Melastomataceae, Polygonaceae, Santalaceae, and Urticaceae. We also identified the first species from this region of the world that can be considered as manganese hyperaccumulators in the genera Lomatia (Proteaceae), Calycogonium (Melastomataceae), Ilex (Aquifoliaceae), Morella (Myricaceae), and Pimenta (Myrtaceae). Finally, we report the first zinc hyperaccumulator, Rinorea multivenosa (Violaceae), from the Amazonas region. The identification of species able to accumulate high amounts of metals will become instrumental to support the development of phytotechnologies in order to limit the impact of soil metal pollution in this region of the world.


Assuntos
Fluorescência , Manganês/análise , Níquel/análise , Plantas/metabolismo , Zinco/análise , Raios X
8.
BMC Cancer ; 21(1): 394, 2021 Apr 12.
Artigo em Inglês | MEDLINE | ID: mdl-33845808

RESUMO

BACKGROUND: RNA-seq data are increasingly used to derive prognostic signatures for cancer outcome prediction. A limitation of current predictors is their reliance on reference gene annotations, which amounts to ignoring large numbers of non-canonical RNAs produced in disease tissues. A recently introduced kind of transcriptome classifier operates entirely in a reference-free manner, relying on k-mers extracted from patient RNA-seq data. METHODS: In this paper, we set out to compare conventional and reference-free signatures in risk and relapse prediction of prostate cancer. To compare the two approaches as fairly as possible, we set up a common procedure that takes as input either a k-mer count matrix or a gene expression matrix, extracts a signature and evaluates this signature in an independent dataset. RESULTS: We find that both gene-based and k-mer based classifiers had similarly high performances for risk prediction and a markedly lower performance for relapse prediction. Interestingly, the reference-free signatures included a set of sequences mapping to novel lncRNAs or variable regions of cancer driver genes that were not part of gene-based signatures. CONCLUSIONS: Reference-free classifiers are thus a promising strategy for the identification of novel prognostic RNA biomarkers.


Assuntos
Biomarcadores Tumorais , Neoplasias da Próstata/genética , Neoplasias da Próstata/mortalidade , Transcriptoma , Algoritmos , Biologia Computacional/métodos , Perfilação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Masculino , Prognóstico , Neoplasias da Próstata/patologia , Recidiva , Reprodutibilidade dos Testes , Aprendizado de Máquina Supervisionado
9.
Life Sci Alliance ; 2(6)2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31732695

RESUMO

The use of RNA-sequencing technologies held a promise of improved diagnostic tools based on comprehensive transcript sets. However, mining human transcriptome data for disease biomarkers in clinical specimens are restricted by the limited power of conventional reference-based protocols relying on unique and annotated transcripts. Here, we implemented a blind reference-free computational protocol, DE-kupl, to infer yet unreferenced RNA variations from total stranded RNA-sequencing datasets of tissue origin. As a bench test, this protocol was powered for detection of RNA subsequences embedded into putative long noncoding (lnc)RNAs expressed in prostate cancer. Through filtering of 1,179 candidates, we defined 21 lncRNAs that were further validated by NanoString for robust tumor-specific expression in 144 tissue specimens. Predictive modeling yielded a restricted probe panel enabling more than 90% of true-positive detections of cancer in an independent The Cancer Genome Atlas cohort. Remarkably, this clinical signature made of only nine unannotated lncRNAs largely outperformed PCA3, the only used prostate cancer lncRNA biomarker, in detection of high-risk tumors. This modular workflow is highly sensitive and can be applied to any pathology or clinical application.


Assuntos
Neoplasias da Próstata/genética , Análise de Sequência de RNA/métodos , Transcriptoma/genética , Biomarcadores Tumorais/genética , Estudos de Coortes , Regulação Neoplásica da Expressão Gênica/genética , Humanos , Masculino , Próstata/patologia , Neoplasias da Próstata/diagnóstico , RNA Longo não Codificante/genética , Estudos Retrospectivos
10.
Genome Biol ; 18(1): 243, 2017 12 28.
Artigo em Inglês | MEDLINE | ID: mdl-29284518

RESUMO

We introduce a k-mer-based computational protocol, DE-kupl, for capturing local RNA variation in a set of RNA-seq libraries, independently of a reference genome or transcriptome. DE-kupl extracts all k-mers with differential abundance directly from the raw data files. This enables the retrieval of virtually all variation present in an RNA-seq data set. This variation is subsequently assigned to biological events or entities such as differential long non-coding RNAs, splice and polyadenylation variants, introns, repeats, editing or mutation events, and exogenous RNA. Applying DE-kupl to human RNA-seq data sets identified multiple types of novel events, reproducibly across independent RNA-seq experiments.


Assuntos
Biologia Computacional/métodos , Variação Genética , RNA/genética , Software , Alelos , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Poliadenilação , Splicing de RNA , RNA Antissenso , RNA Longo não Codificante/genética , RNA Mensageiro/genética , Reprodutibilidade dos Testes , Análise de Sequência de RNA , Transcriptoma
11.
Stat Appl Genet Mol Biol ; 14(5): 413-28, 2015 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-26461845

RESUMO

In co-expression analyses of gene expression data, it is often of interest to interpret clusters of co-expressed genes with respect to a set of external information, such as a potentially incomplete list of functional properties for which a subset of genes may be annotated. Based on the framework of finite mixture models, we propose a model selection criterion that takes into account such external gene annotations, providing an efficient tool for selecting a relevant number of clusters and clustering model. This criterion, called the integrated completed annotated likelihood (ICAL), is defined by adding an entropy term to a penalized likelihood to measure the concordance between a clustering partition and the external annotation information. The ICAL leads to the choice of a model that is more easily interpretable with respect to the known functional gene annotations. We illustrate the interest of this model selection criterion in conjunction with Gaussian mixture models on simulated gene expression data and on real RNA-seq data.


Assuntos
Anotação de Sequência Molecular , Algoritmos , Análise por Conglomerados , Interpretação Estatística de Dados , Expressão Gênica , Perfilação da Expressão Gênica , Modelos Genéticos , Análise de Sequência de RNA
12.
PLoS One ; 8(10): e77503, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24147011

RESUMO

Gene network inference from transcriptomic data is an important methodological challenge and a key aspect of systems biology. Although several methods have been proposed to infer networks from microarray data, there is a need for inference methods able to model RNA-seq data, which are count-based and highly variable. In this work we propose a hierarchical Poisson log-normal model with a Lasso penalty to infer gene networks from RNA-seq data; this model has the advantage of directly modelling discrete data and accounting for inter-sample variance larger than the sample mean. Using real microRNA-seq data from breast cancer tumors and simulations, we compare this method to a regularized Gaussian graphical model on log-transformed data, and a Poisson log-linear graphical model with a Lasso penalty on power-transformed data. For data simulated with large inter-sample dispersion, the proposed model performs better than the other methods in terms of sensitivity, specificity and area under the ROC curve. These results show the necessity of methods specifically designed for gene network inference from RNA-seq data.


Assuntos
Neoplasias da Mama/genética , Redes Reguladoras de Genes , MicroRNAs/genética , Modelos Biológicos , Modelos Estatísticos , Algoritmos , Neoplasias da Mama/patologia , Simulação por Computador , Feminino , Humanos , Curva ROC
13.
Bioinformatics ; 29(17): 2146-52, 2013 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-23821648

RESUMO

MOTIVATION: RNA sequencing is now widely performed to study differential expression among experimental conditions. As tests are performed on a large number of genes, stringent false-discovery rate control is required at the expense of detection power. Ad hoc filtering techniques are regularly used to moderate this correction by removing genes with low signal, with little attention paid to their impact on downstream analyses. RESULTS: We propose a data-driven method based on the Jaccard similarity index to calculate a filtering threshold for replicated RNA sequencing data. In comparisons with alternative data filters regularly used in practice, we demonstrate the effectiveness of our proposed method to correctly filter lowly expressed genes, leading to increased detection power for moderately to highly expressed genes. Interestingly, this data-driven threshold varies among experiments, highlighting the interest of the method proposed here. AVAILABILITY: The proposed filtering method is implemented in the R package HTSFilter available on Bioconductor.


Assuntos
Perfilação da Expressão Gênica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de RNA/métodos , Animais , Humanos , Camundongos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...